Automatic Construction of the Finnish Parliament Speech Corpus
نویسندگان
چکیده
Automatic speech recognition (ASR) systems require large amounts of transcribed speech data, for training state-of-theart deep neural network (DNN) acoustic models. Transcribed speech is a scarce and expensive resource, and ASR systems are prone to underperform in domains where there is not a lot of training data available. In this work, we open up a vast and previously unused resource of transcribed speech for Finnish, by retrieving and aligning all the recordings and meeting transcripts from the web portal of the Parliament of Finland. Short speech-text segment pairs are retrieved from the audio and text material, by using the Levenshtein algorithm to align the firstpass ASR hypotheses with the corresponding meeting transcripts. DNN acoustic models are trained on the automatically constructed corpus, and performance is compared to other models trained on a commercially available speech corpus. Model performance is evaluated on Finnish parliament speech, by dividing the testing set into seen and unseen speakers. Performance is also evaluated on broadcast speech to test the general applicability of the parliament speech corpus. We also study the use of meeting transcripts in language model adaptation, to achieve additional gains in speech recognition accuracy of Finnish parliament speech.
منابع مشابه
Automatic recognition of emotions in spoken Finnish : preliminary results and applications
In this paper, research on the automatic recognition of basic emotions in spoken Finnish is reported. The investigation was carried out utilizing the MediaTeam Emotional Speech corpus, which is currently the largest emotional speech database for Finnish. In this investigation, three experiments were carried out. In the first two experiments, mainly speaker-dependent automatic classification of ...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملTranscription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)
This article describes a new automatic transcription system in the Japanese Parliament which deploys our automatic speech recognition (ASR) technology. To achieve high recognition performance in spontaneous meeting speech, we have investigated an efficient training scheme with minimal supervision which can exploit a huge amount of real data. Specifically, we have proposed a lightly-supervised t...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملBuilding an ASR Corpus Using Althingi's Parliamentary Speeches
Acoustic data acquisition for under-resourced languages is an important and challenging task. In the Icelandic parliament, Althingi, all performed speeches are transcribed manually and published as text on Althingi’s web page. To reduce the manual work involved, an automatic speech recognition system is being developed for Althingi. In this paper the development of a speech corpus suitable for ...
متن کامل